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Abstract 

CompressecUl Counting (CC) I25I , based on maximally skewed stable random projections, was recently pro- 
posed for estimating the ath frequency moments of data streams. When A = |1 — q:| — > 0, (251 provided an 
algorithm based on the geometric mean estimator and proved that the sample complexity was essentially O (1/e), 
which was a large improvement compared to the previously known O (l/e^) bound. The case A — |1 — ia| 
is extremely useful for estimating Shannon entropy of data streams. 

In this study, we provide a very simple algorithm based on the sample minimum estimator and prove that, 
when A = 1 — a ^ 0+, it suffices to let the sample size k be 



log i - log (i + + 2AlogA+21og(l+.) + O (A)) 

SO that, with probability at least 1 — 5, the estimated ath frequency moments will be within a 1 + e factor of the 
truth. For example, when e = 10"'', 6 = lO"^'^, and A = 10~^, the required sample size is merely k > 5.1. 



1 Introduction 

The problem of "scaling up for high dimensional data and high speed data streams" is among the "ten challenging 
problems in data mining research"! 36 1. This paper is devoted to estimating entropy of data streams. Mining data 
streams |T9l |4l [Tl [291 in (e.g.,) 100 TB scale databases has become an important area of research, e.g., lITOl lTI. as 
network data can easily reach that scale[36|. Search engines are a typical source of data streams|4l. 

Consider the Turnstile stream model[[29|. The input stream at — {it, It), H G [1, D] arriving sequentially 
describes the underlying signal A, meaning 

At[it]^ At^i[it] + It, (1) 

where the increment /( can be either positive (insertion) or negative (deletion). Restricting At[i] > results in 
the strict-Turnstile model, which suffices for describing almost all natural phenomena. This study focuses on the 
strict-Turnstile model and studies efficient algorithms for estimating the ath frequency moments of data streams 

D 

(2) 

1=1 

We are particularly interested in the case of a ^ 1, which is very important for estimating Shannon entropy. 

' Extended abstract, submitted on July 6, 2009. 
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1.1 Entropy 

A very useful (e.g., in Web and networks lfT2l |23] l37l l27l and neural comptutations ll30l ) summary statistic is the 
Shannon entropy 

Various generalizations of the Shannon entropy have been proposed. The Renyi entropy 1 3 1 1 , denoted by , and 
the Tsallis entropv l.l8..33.l . denoted by Ta, are respectively defined as 



(1) 



As a ^ 1, both Renyi entropy and Tsallis entropy converge to Shannon entropy: limQ^i = limcj^i = 
H. Thus, both Renyi entropy and Tsallis entropy can be computed from the ath frequency moment; and one can 
approximate Shannon entropy from either or by letting awl. Several studies llJTl [iTl |T6l ) used this 
idea to approximate Shannon entropy, all of which relied on efficient algorithms for estimating the ath estimating 
frequency moments (|2|i near a = 1. In fact, one can numerically verify that the a values proposed in ifTTlfTBi are 
extremely close to 1, e.g., A = |1 — aj < 10^"'. 

Therefore, efficient algorithms for estimating F(^a) near a = 1 is critical for estimating Shannon entropy. 



1.2 Sample Applications of Shannon Entropy 
1.2.1 Real-Time Network Anomaly Detection 

Network traffic is a typical example of high-rate data streams. An effective and reliable measurement of network 
traffic in real-time is crucial for anomaly detection and network diagnosis; and one such measurement metric is 
Shannon entropv lfT2l l22l [35] 171 |23l l37l . The Turnstile data stream model ([T]) is naturally suitable for describing 
network traffic, especially when the goal is to characterize the statistical distribution of the traffic. In its empirical 
form, a statistical distribution is described by histograms, At[i], i = 1 to D. It is possible that D = 2^^ (IPV6) if 
one is interested in measuring the traffic streams of unique source or destination. 

The Distributed Denial of Service (DDoS) attack is a representative example of network anomalies. A DDoS 
attack attempts to make computers unavailable to intended users, either by forcing users to reset the computers 
or by exhausting the resources of service-hosting sites. For example, hackers may maliciously saturate the victim 
machines by sending many external communication requests. DDoS attacks typically target sites such as banks, 
credit card payment gateways, or military sites. 

A DDoS attack changes the statistical distribution of network traffic. Therefore, a common practice to detect 
an attack is to monitor the network traffic using certain summary statics. Since Shannon entropy is a well-suited 
for characterizing a distribution, a popular detection method is to measure the time-history of entropy and alarm 
anomalies when the entropy becomes abnormal lfT2l l23l . 

Entropy measurements do not have to be "perfect" for detecting attacks. It is however crucial that the algorithm 
should be computationally efficient at low memory cost, because the traffic data generated by large high-speed 
networks are enormous and transient (e.g., 1 Gbits/second). Algorithms should be real-time and one-pass, as 
the traffic data will not be storedl4J. Many algorithms have been proposed for "sampling" the traffic data and 
estimating entropy over data streams ll23ll37l l6irT5ll3ll8l [T7llT6l . 



1.2.2 Entropy of Query Logs in Web Search 

The recent workfZTl was devoted to estimating the Shannon entropy of MSN search logs, to help answer some 
basic problems in Web search, such as, how big is the web? 

The search logs can be viewed as data streams, and ||27| analyzed several "snapshots" of a sample of MSN 
search logs. The sample used in [27] contained 10 million <Query, URL,IP> triples; each triple corresponded 
to a click from a particular IP address on a particular URL for a particular query. ll27l drew their important 
conclusions on this (hopefully) representative sample. Alternatively, one could apply data stream algorithms such 
as CC on the whole history of MSN (or other search engines). 
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1.2.3 Entropy in Neural Computations 

A workshop in NIPS '03 was denoted to entropy estimation, owing to the wide-spread use of Shannon entropy in 
Neural Computations f30l. (http : / / www . menem . com/^ilya7pag es/NIPS03[ ) For example, one appli- 
cation of entropy is to study the underlying structure of spike trains. 

1.3 Previous Algorithms for Estimating Frequency Moments 

The problem of approximating i^(Q,) has been very heavily studied in theoretical computer science and databases, 
since the pioneering work of (2\, which studied a — 0,2, and a > 2. ifTTl 1201 1241 provided improved algorithms 
for < a < 2. [21J provided algorithms for a > 2 to achieve the lower bounds proved by [32, 5. 34.1. lfT4l 
suggested using even more space to trade for some speedup in the processing time. 

Note that the first moment (i.e., the sum), can be computed easily with a simple counter ll28l [131 l2l . This 
important property was recently somewhat captured by the method of Compressed Counting f'CC) ll25l . which 
was based on the maximally-skewed stable random projections. ||25l proved that, in the neighborhood of a = 1, 
the sample complexity is essentially O (1/e), which was a large improvement over the well-known O (l/e'^) 
bound [3_4, 20, 24\. This means the required sample size using CC should be O (1/e) in order to ensure that the 
estimated ath frequency moment will be within a 1 ± e factor of the truth, with high probability. 

The sample complexity bound of O (1/e) for CC is unsatisfactory, not just for theoretical reasons. From 
a practical point of view, 1/e can be too large to be practical, especially for entropy estimation. For example, 
one can numerically verify that the required e values in (TP, TSI for entropy estimation are very small. Very 
recently, without providing any theoretical complexity bounds, [26| proposed an empirically improved (and quite 
sophisticated) algorithm for CC. Because the algorithm in ll26l is quite complex, its theoretical analysis was 
difficult. 

This study proposes a very simple algorithm, which also allows us to analyze its sample complexity. The 
complexity is essentially O ( iog(i/A)-iog(i/e) ) ' when A = 1 - a ^ 0. 

2 The Proposed Algorithm and Main Theoretical Results 

We consider the strict-Turnstile model ([U. Conceptually, we multiply the data stream vector At £ M^^^ by a 
random projection matrix R G M^^*^. The resultant vector X = x R G M'''^^ is only of length k. More 
specifically, the entries of the projected vector X are 

D 

Xj = [At X R]^. = J2 ^^jM^]^ J = 1> 2, k 
1=1 

rij's are random variables generated by 

sin {avij ) 
[smviji 

where Vij ^ uniforni{0, tt) (i.i.d.) and Wij ~ exp(l) (i.i.d.), an exponential distribution with mean 1. 

Of course, in data stream computations, the matrix R is never fully materialized. The standard procedure in 
data stream computations is to generate entries of R on-demand l|20| . In other words, whenever an stream element 
at — {it, It) arrives, one updates entries of X as 

'^j ^ "^J It^itj^ j — -^7 2, ...5 k. 

The proposed algorithm is to take the sample minimum: 

F{a),min = [mlu {Xj, j = 1, 2, k}]" . (6) 
While this estimator is extremely simple, it has nice theoretical properties. 



sin (vijA) 



1 - a > 0, 



(5) 
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Theorem 1 As A = 1 — a ^ 0+, for any fixed e > 0, 



> (l + e)F(„)j <exp [k\og- 
Therefore, it suffices to let the sample size 

k > 



A 



log(l + e) AlogA + log(l + e) 



0(A2) 



log log (i 



(7) 



(8) 



21og(l+£) 2AlogA+21og(l+e) + (^) 

SO that with probability at least 1 — 5, -F(a),niin within a I + e factor of F^a)- 

The proof is deferred to Section l4!2l which will also demonstrate that the right tail bound (|7]) can be slightly 
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Figure 1: Right tail bound (|7]i for selected A and fc, together with the simulated tail probabilities. 

To help verify the results in TheoremfT] Figure [T] plots the right tail bounds ^ for A = 10^"' (fc — 1, 2, 3) 
and A ~ 10^^ {k = 1 only), together with the simulated tail probabilities. We can see that the tail probabilities 
decrease very rapidly. In fact, it is even difficult to simulate the tail probabilities if fc > 3 or A < 10^^. 

Theorem [T] indicates that required sample size k can be very small. For example, if we let e = 10^'^, 5 = 
lO^^*', and A = 10~^, then according to (O, the required sample size is merely k > 5.1 

Note that Theorem [T] is just for the sample complexity. To obtain the space complexity, we must consider 
an multiplicative factor of ^ogJ2l=i \Is\- In addition, we must store r^j with a sufficient accuracy. In Section 
[3j Lemma [T] shows that logry — O (|Alog A|), which can be represented using O (logl/A) bits. Therefore, 

the required storage space would be the sample complexity ^ multiplied by a factor of O (logJ2l=i \Is\^ 

+0 (logl/A). 
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Theorem|2]presents the left tail bound. 
Theorem 2 For any < e < 1, a < 1, and A = 1 — a. 



Pr {F(,),„i„ < (1 - e)F(,) ) < fcexp ( ) ■ (9) 



The proof is deferred to Section BTTI 

The left bound (|9]l approaches zero extremely fast. For example, when A = 10^^ and e — 10^^, ^''^i/a ~ 
10'^^; and hence k does not really matter for the left bound. In a sense, the left bound will be used merely for the 
sanity check and one can determine the sample size mainly from the right bound in Theorem[T] 

3 Preparation for the Proofs of the Main Results 

We start with reviewing maximally-skewed stable distributions, because our formulation (|5]) somewhat differs 
from the standard formulation. 

3.1 Maximally- Skewed Stable Distribution 

The standard procedure for sampling from skewed stable distributions is based on the Chambers-Mallows-Stuck 
method|9|. To generate a sample from S{a, (3=1, 1), i.e., a-stable, maximally-skewed (/? = 1), with unit scale, 
one first generates an exponential random variable with mean I, W ^ exp(l), and a uniform random variable 

U ^ uniform (— f , f ), then. 



^, ^ sin {a{U + pj) 



[cos ?7 cos (pa)] 



l/a 



cos([/ - a{U + p)) 



W 



S{a,P = 1,1), (10) 



where p = ? when a < 1 and p = ^ when a > 1. 
For convenience, we will use 

Z = Z' cos^/" {pa) - S* (a, /? = 1, cos {pa)) . 
In this study, we will only consider a = 1 — A < 1, i.e, p = f • After simplification, we obtain 

"sin(yA)" 



^ ^ sin {aV) 
[sml/J ' 

where V = ^ + U ^ uniform{Q, tt). This explains (O. 



W 



(11) 



Lemma[T] shows log Z = O (| A log A|), which can be accurately represented using O (log 1/A) bits. The 
proof is omitted since it is straightforward. 

Lemma 1 For any given V ^ 0, and W 7^ 0, as A — > 0, 

Z = l + 0(|AlogA|), i.e., logZ = 0(|AlogA|). 

3.2 Random Projections and the Sample Minimum Estimator 

Let X = At X R, where entries are R are i.i.d. samples of S (a, P = I, cos (fa)). Then by properties of stable 
distributions, entries of X are 

D 



[At X R]^. = ^njAt[i\ S(a,l3= l,cos Qa) , 



where Fi^a) = SiLi ^* defined in (|2|. 

The proposed estimator of F(q,) is based on the sample minimum: 

F(a),-min = [min{xj,j = 1,2,..., A:}]' 
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3.3 Density Function 

Lemma 2 Suppose a random variable Z ^ S {a < 1^ (3 = 1^ cos (^a)), then the cumulative density function is 
Fz{t) = Vv{Z<t) = - /\xp[-J^;^i^^ll^sin(M)|d0, (A = l-a). 



Proof: 



Pr {Z >t) = Pr 



sin (aV) 



sin (VA) 



sin (g-j/)]" /^ 

^a/A [gin 

sin {aV)f /^ ^ 
t"/A [sin 



W 



> t 



V 



=1 - £ exp — ^ !^ sin (FA) 

1 ( [sin (gg)]"/ ^ 

t"/'^ [sin 61]^ 



=1 _ i / cxp _ J2!!iv:f:izj sin den 







For 9 £ (0, tt), let 

AX [sin (a6')l"^^ . ,^ ^ , 
[smt^J ' 

Lemma |3] includes some properties of A), which will be useful for proving our main results in Theorem 
[T]and Theorem|2] 

Lemma 3 Assutne A = 1 — a < 0.5, then g{9; A) is monotonically increasing in (0, tt), with 

lim g(9-A) = Aa^/'^-\ 

Moreover, g{9; A) is a convex function of 9. 

4 Proofs of Theorem [1] and Theorem |2] 

We first prove the left bound in Theorem|2l 
4.1 Proof of Theorem H 

Recall the sample minimum estimator is 

F(a),,nin = [min , j = 1,2, ...,k}f , Xj ^ S (^a <1,P ^ l,cos (^^aj . 
Using the density function provided in Lemma|2]and properties of g(9; A) = [sin(ae)]^^ {OA) proved in 
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Lemma [3], we obtain 



Pr (i^(a),min < (1 " e)J^(a) 

<k X Pr < (1 - e)) 

[sin {a0)f/^ 
(1 -e)i/A [sin 61] ^/"^ 



—k— j exp — 



sin (6IA) dO 



=k- 



cxp 



(l-e)i/A 
Aai/A-i 



-k exp 



V (1-^)1/^ 



de 



(l-e)i/A^ 

4.2 Proof of Theorem I] 

Using the density function provided in Lemma|2] we can obtain 



Pr (^F(„),„,i„ > (l + e)^^(„)^ 

=Pr (F(c«),min/^^(a) > (1 + e)) 

[sin {a6)f'^ 



1 / exp 

Jo 



exp ( fclog 



(1 + e)i/A [sin 



exp 



l/A 

.9(e;A) 



(l + e)i/A 



sin {9 A) ) ^6* 

dO 



We proceed the proof as follows: 

1. Using the fact that e^^ > max{0, 1 — a;}, we obtain 

Pr (^(a),min > (1 + < exp ^fc log 

where 6q is the solution to 



So 



(1 + .)Va'^^' 



1 = 



g(^;A) 

(l + e)i/^ 



2. We prove a more general result to solve for 



(l + e)i/A- 



We show the asymptotic expression for 9^ is, as A — > 0, 



=7r — TT- 



A + 7A log A + log(l + 6) + A log ( ^^^^^^ij^^^^ + 1) + O (A 



(12) 
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3. We approximate the integral 1 — jj^^j^dO by the ti-apezoid rule. Because g{6, A) is a convex function 
of 6 as proved in Lemma|3] we know this approximation still leads to an upper bound we are after 

4. To apply the trapezoid rule, it turns out that it suffices to use only one interior point, 6* = 6*1, in addition to 
the two end points, 6 = — 9oo and 9 = 9q. 0i is the solution to A = ^qj^y^s-- 

5. We can slightly improve the bound by using more points when applying the trapezoid rule, for example, 
9 = 9i/2, in addition to 9o, 9i, and 9oo- 

We defer the proof of (fT2] i to Appendix iB] Assuming (fT2] l holds, we have 



> (1 + 



Pr 

= exp ^fc log 

< exp ^/c log 

< exp ^fc log 
= exp k log 



1 / exp 



1 

JO 

1 

TT 



00 



(l + e)VA 



d9 



i0iA + i(l-A)(0o-^i) 



1 - 7^ [00 + 01 - A9o] 



2--[9o + 9i- A9o] 

TT 



1 + log A + i log(l + e) + log ( 



Alog A+log(l+e) 



l)+0(A) 



+ 



l + ilog(l + e)+log(j^ + l)+0(A) 
i log(l + .)+ log (j^ + l)+ 0(A) 



-A 



1 + i log(l + e) + log (j^^ + l) + O (A) 



A + A log A + log(l + e) + A log 

A 

A + log(l + e) + A log ( 1^^^ + l) + O (A2) 
A A 



AlogA+log(l+^) +lJ+0(A^) 

a + o(a2) 



=A + 

log(l + e) AlogA + log(l + e) 

Therefore, if we require 

Pr (F(a).„nn > (1 + 



OA' 



< exp ^fc log i 



A + 



log(l + e) AlogA + log(l + e) 



O A^ 



we obtain our main result, the sample complexity bound. 



k > 



log I 



log i - log (i + JTEirR) + 2AlogA+21og(l+e) + O (A) 
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It turns out, the term A log A can be almost removed, by using one additional interior point when applying 
the trapezoid rule. Note that | A log A| is almost as small as A, but we do not want to simply ignore this term. 

Using two interior points, 6i and 9t, where <t < 1, we obtain 



Pr 



(i^(a),min > (l + e)F(a)) 



= exp ^fclog 

< exp ^fc log 

< exp ^fc log 
= exp I k log 



TT 



exp 



[sin {a9)f/^ 



\ (l + e)i/A[sin^]^/ 
^Jo (l + e)i/A[sin0]i/^ 



^sin(M)U0 



h - ^^lA + l{e,-9,){l-A + l- A*) + 1(1 - A*)(eo - ' 



1 - 7^ [^0 + - AOt - A'0o + A*0i] 

ZTT 



2--[9o + 9t- Mt - A% + A'ei] 



1 + tlog A + i log(l + e) + log + l) + O (A) 



+ 



l + ^log(l + e)+log (,-^ + l)+0(A) 

i log A + ^ log(l + e) + log ( tAlogA+log(l+e) 

+ l)+0(A) 

1 + i log A + ^ log(l + e) + log ( tiogA+iog(i+.) + l) + O (A) 
ilog(l + 6)+log(,3^ + l)+0(A) 



-A^ 



^ log(l + e) + log + 0(A) 

log A + ^ log(l + e) + log ( AiogA+iog(i+.) + l) + O (A) 
l + logA+i^log(l + e)+log( 

AlogA+log(l+e) + 

l)+0(A) 



=A + + + O (A2) 

log(l + e) iAlogA + log(l + e) ^ ^ 

Note that, if we choose t to be too small (too close to 0), then (—A* 6*0 + A*6'i) will be larger than O (A^) and 
can not be ignored. Therefore, although we can minimize the impact of the term A log A to a very large extent, it 
can not be entirely removed, theoretically speaking. 

5 Conclusion 

Real-world data are often dynamic and can be modeled as data streams. Measuring summary statistics of data 

streams such as the Shannon entropy has become an important task in many applications, for example, detecting 
anomaly events in large-scale networks. One line of active research is to approximate the Shannon entropy using 
the ath frequency moments of the stream with a extremely close to 1. 

Efficiently approximating the ath frequency moments of data streams has been very heavily studied in theoret- 
ical computer science and databases. When < a < 2, it is well-known that efficient O (l/e^)-space algorithms 
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exist, for example, symmetric stable random proiections f2^ l24ll . which however are impractical for estimating 
Shannon entropy using a extremely close to 1. Recently, ||25) provided an algorithm to achieve the O (1/e) 
bound in the neighborhood of a = 1, based on the idea of maximally-skewed stable random projections (also 
called Compressed Counting (CC)). The O (1/e) bound, although a very large improvement over the previous 
O (l/e^) bound, is still impractical. 

This study proposes a new algorithm for CC based on the sample minimum, which is simple, practical, and 
still has very nice theoretical properties. Using this algorithm, we have proved that the sample complexity is 
essentially O (^ iogi/(i_Q)^iogi/g ^ as a — > 1 — . This is a very large improvement over the previous 0(l/e) 
bound and may impact the practice. 



A Proof of Lemma |3] 



For 9 G (0, tt), let 



9(0; A) ^^^^^^^ sin m 



[sine] 



It is easy to show that, as 9 ^ 0+, 



lim g{0,A)=\im M^^M^ sin 



61^0+ 



= lim 



sin (a6 



6^0+ \ sni 
A 



l/A 



sin {9 A) 
sin {a9) 



a 



The proof of the monotonicity of g{9, A) is omitted, because it is can be inferred from the proof of the 
convexity. 

To show g{9; A) is a convex function 9, it suffices to show it is log-convex. Since 



gi9; A) = sin(0A) 



it suffices to show that both ^ 



,(0A) 



sin{Q6^) 



and 



sin(e) 



[sin(a6')]"/'^ _ sin(6'A) 
[sin(6l)]i/A - sin(a6») 

l/A 

are log-convex. 



sin(a6') 
sin(6') 



l/A 



91ogsin(^^A) — logsin(a6') cos(0A) cos(q!6') 

~ -A — —a 



do 



sin(6'A) sin(a6') 



logsin(6'A) — logsin(c 



A2 



A 



sin2(6'A) sin^ (a6') \sm{a9) sin(0A) J Vsin(a6l) sin(6'A) 



A 



9asin(6'A) — Asin(c 
89 



Aa(cos(6'A) - cos(a6')) > 



(because A < 0.5) 



Therefore, a sin(6'A) — A sin(Q;6') > and 



sin(6)A) 
sin(a^) 



IS convex. 



91ogsin(a6') — logsin(6') cos{a9) cob(9) 

— -a 



89 



sin(a6') sin(6') 
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ff^ logsin(a6') — logsin(6') 



9612 sm^iaO) sin^{e) \sm{e) smiaO) J \sm{e) sin(a6 

— '- — = a cosfa^) - cos(6l) > because a = 1 - A > 0.5 

Therefore, we have proved the convexity of g {9; A). 

B Proof of Equation dH) 



6 J is the solution to 



[sin (ae)]"^'^ 
(l + e)i/A [sin 6*] 

Equivalently, 



/I , ,M / A r,,:„ /lll/A ^ ' 



7 log A + log(l + e) + ^ log sin & = log sin {9 - A9) + log sin {A9) 



, A 1 , ^ , sin(6'-A6l) 1, sin(6l-A6') 

7log A + - log 1 + e + log . , ■ - -r log \ „ 

A sm(Aw) A sm^ 



7log A + -i- log(l + e) + log (sin 9 '^°^^/^^] ~ cos 9] = x log f cos(A6') - sin(A6l)^^' 



A ' sin(A6l) y A sin^ 

We apply Taylor expansions, 

, , 1, , , , / -sin6' A6lsin6l \ , . „^ A9^ 0cos9 A9^ cos^ 9 

7logA + — log 1 + e +log — + — — + 1 + ... + log -cos 6* = — ^ + 

A ' ^\A9cos9 3cos9 J ' 2 sin9 2 sin^ 9 

to obtain 

1 , . N , / — sin0 \ „ / . 9\ 9 cos 9 

7logA + -log 1 + e +log -TT^ ^ + A +1 +0 A2 = + O A 

A \At^cost^ / snit^ 

where we have replaced log (— cosf?) with O (A^) (as A —> 0). This fact can be later verified. 
Let T = — , C = 7 log A + -g- log(l + e). This requires us to solve a fixed point equation: 

r = C + log(^^ + 0(A) + l^ +0(A). 

We resort to an iterative method. Starting with T^^^ = 1, 

t(i) = C + log ("^ + O (A) + l") + O (A) = C - log(A) + O (A) . 



r(2) =c + log 

=C + log 
+ log 



A(C-log(A) + 0(A)) 
1 

(7 - 1) A log A + log(l + e) + O (A2) ^ 
1 + (7 - 1) A log A + log(l + e) + 0{ A^) 
(7 - l)Alog A + log(l + e) + 0(A2) 



0(A) + 1 ) + 0(A) 

0(A) + 1 
+ 0{A) 



0(A) 
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r(3) =c + log 

=C + log 
=C + log 



1 

, V ^ (7-l)Alog A+log(l+e)+0(A2) 

^7A log A + log(l + e) + (A) 
^l + 7AlogA + log(l + e) + C>(A) 
^ 7AlogA + log(l + e) + 0(A) 




t(^) =C + log — 7 ^ + O A + 1 + O A 

y^^^O+lOg ^AlogA+log(l+e)+0(A2) ^ / 

=^ + ( 7AlogA + loga + e) + 0(A) + ^ (A) + l) + ^ (A) 
^ , ^_^ l + 7AlogA + log(l + 6) + 0(A) \ , ^^^^ 
+ I 7AlogA + log(l + .) + 0(A) j + ^ • 

At this point, we have reached an equiUbrium. Therefore, we know 



Note that 

^__6'cos6' _ 6»cos(7r-6') 1 _ tt - 6* 

sin^ sm{TT — 6) \Tr — 6 3 

Thus, assuming 0{Tr — Or) = O (A) (which can be verified), we obtain 

7l0g A + i log(l + e) + log ( -^AlogA+log(l+e) 

+ l) + O (A) 

' "''l + 7log A + i log(l + e) + log ( ^AiogAliog(i+e) + l) + ^ 

7A log A + log(l + e) + A log ( -yAlogA+log(l+e) + 

l)+0(A2) 



=7r 



A + 7AlogA + log(l + e) + Alog 

logA+log(l+£) + 

l)+0(A2) 

A 

A + 7A log A + log(l + e) + A log ( ,Aiog A^iogd+e) + 1) + O ( A2) 
To complete the proof, we must verify 0(7r — Or) = O (A) and log(— cos(0r-)) = O (A^). Indeed, 

0{-K - Or) = TT — ^ = O (A) 

A + 7Alog A + log(l + e) + Alog ( ,a log A+iog(i+.) +^)+0 (A^) 



log(- cos(e,)) = log (cos(7r - Or)) = log (cos(0(A))) = log (^1 - j = O (A^) . 
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